Popit Next Gen International Open Government Data Standards Toolkit

Author: Khairil Yusof khairil.yusof@sinarproject.org

Date: 2 July 2016

PopIt, a Popolo standards based database and API helps you make and maintain information about politicians, and generates open data to power your project

In constrained environments where valuable information for governance is hard to get, Popit database and API provides a public service for storing and sharing information from different organizations following Popolo international open government data standards, so that people of these countries and environments can also benefit from open data. It creates an enabling environment for countries to get benfits of open data when limited open data on politicians, legislature and government information is provided by their respective governments.

A central database is needed in countries with incomplete data sources, because data needs to be collected first. For countries like Malaysia, information for the database needs to be contributed from a variety of sources, and collaboration from different partners. Rather then a static set from a single source, data is constantly collected and improved until enough is available for it to be useful.

Popolo international open government standards are used for the database schema. When data sourced from different sources and formats, it needs to be mapped to a consistent standard. When data is incomplete, a reference standard provides structure of what data is still missing. A standard also provides for reusability of tools and applications developed by others, while providing new developers an API with a consistent set of fields.

Central database resusable database of PEPs

To develop applications using joined up data, it is important to have consistent unique identifier. A central database provides this along with additional identifiers. A database of politically exposed persons and their memberships in committees, boards and organizations also provides a reusable extensive list that removes the need for duplication of research work for persons and organizations such as politicians, senior public officials, constituencies and government departments.

Joined up data: Matching Government Linked Companies in Malaysia to database of known PEPs

Research team lead by Professor Terence Gomez at University Malaya mapped share ownership and the directors of government linked companies (GLC) and government investment companies (GIC), but needed data to match a few hundred names against known database of politicians and government officials. The exercise quickly provided results and brief descriptions of matches. From this exercise we learned from the research partner that for transparency purposes, that data of senior public officials is also just as important as politicians. Legal system of judiciary and prosecutors should also be added to the database. This would provide a good open data resource for transparency research.

Collaboration: OpenHluttaw Myanmar using Popolo standards in sources of crowdsourced data

Myanmar recently had elections in 2015, for which data on elected representatives were collected by different parties. Partners Myanmar Fifth Estate and Open Myanmar Initiative collaborated to make multingual legislative information available as public data, starting with their respresentatives, and eventually adding additional linked Popolo data on motions and legislative documents. On a standard public API, this same source of continously improved legislative data can be reused in multiple ways by the public. Initially as a parliamentary monitoring website, it can also be resused for statistics and generating contact lists. In future the same database and API can used as data source for mobile apps, or joined up data for transparency, or reuse other Popit API tools such as Sinar's relationships viewer.

OpenHluttaw also tested a new method of importing and syncing data into Popit database from Google Docs spreadsheet that complied to Popolo standard fields. Initial results show that standards helps ease collaboration and reuse of data in specific fields such as legislature. The use of Google Docs with popolo fields also reduces the barrier to entry for contributions to the central database. In future, ability to download CSV of complete or incomplete lists of names or members from the DB with Popit IDs, would be helpful in improving process of contributions to the central database.

Limitations of Popolo Standards

Relation Class

https://github.com/popolo-project/popolo-spec/issues/56

  • Malaysia. Needed to map PEPs which by definition includes immediate relatives
  • Myanmar. Needed to map parents #waiting on answer as to why

Basic Resume Class

  • Malaysia. Only has complete biographic of 66/222 Members of Parliament. Some basic info on qualifications would be helpful in auto-generating CVs. Partners also interested in knowing skillsets and professional qualifications of MPs. There are also known instances of fake degrees being able to store degree and university as structured data would be helpful for analysis
  • Myanmar. Also does not have full bio, but has education and last employed position held that they wish to store in Popit DB which is not part of the spec

Improvements in the API/DB

  • Previous implementation only had subset of the Popolo standards required to map politicians. Goal of the new API is to implement complete Popolo-spec. The first new version implements all clases except Motions, Events, Votes and Speech
  • Better multingual negotiation using standard url codes instead of headers
  • Per field citations/source links to ensure data integrity and contributions
  • HTTP access controls CORS support to enable javascript/browser based rendering for web clients
  • Consistent naming convention for classes and results, eg. persons and organizations plural always and results
  • Empty fields always ruturns NULL value, and display of all fields even if empty. This remove a lot of value and key checking in client code
  • Expansion of nested objects in results to reduce lookups and easily understand results. This is however detabable as an improvement as it can result in very large downloads of results

Other database and framework implementation

Django Rest Framework backed by Postgresql Database was chosen over Node.js and Mongo DB API, for better data integrity by enforcing data types and foreign key relationships, while providing flexibility for other fields by storing JSON values and GIS features with spacial objects support.

MySociety also opted for the same decision as Sinar for their YourNextRepresentative website and developer Mark Longair lists out in detail the technical decision for this choice.

Resources

Deploying Popit Next Gen

Popit Next Gen source code can be downloaded from GitHub.

Organizations in countries that need support with setting up Popit DB and API in their country can contact team@sinarproject.org for support.

Current features

1. CRUD API for Person, Organization, Post, Membership following popolo standard.
2. Implement Othername, ContactDetails, Area, Links, Identifier following popolo standard.
3. Search API for Person, Organization, Post, Membership. Including any entity on 2. that is embedded. 
4. Multilingual support for the feature 1., 2. and 3.
5. Support for json output.
6. Support for API to be displayed on browser.
7. Extensive supporting unit test for supported feature.
8. Extend links to support citation by having an optional field value. There no API to easily browse citations yet.

Appendix

Using Open Hluttaw Popit API for data visualizations


In [9]:
import requests
import pandas
import json

amyotha_req = requests.get('http://api.openhluttaw.org/en/organizations/897739b2831e41109713ac9d8a96c845')
memberships = json.loads(amyotha_req.content)['result']['memberships']

amyotha = []

for member in memberships:
    r = requests.get('http://api.openhluttaw.org/en/organizations/' + member['on_behalf_of_id'])
    if json.loads(r.content)['result']['name']:
        party = json.loads(r.content)['result']['name']
        amyotha.append({'consituency': member['post']['label'],
                    'party':party })
        
amyotha_df = pandas.DataFrame(amyotha)

%matplotlib inline
parties = amyotha_df['party']
pie = parties.value_counts()
pie.plot.pie(figsize=(10,10))


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f763292d790>

Multilingual Support

The API provides an easy way to update translatable fields for as many languages as needed and is supported by elasticsearch. Getting, and updating translated results is as simple as changing the language code in the url.


In [10]:
import requests
import pandas
import json

amyotha_req = requests.get('http://api.openhluttaw.org/en/organizations/897739b2831e41109713ac9d8a96c845')
memberships = json.loads(amyotha_req.content)['result']['memberships']

amyotha_my = []
for member in memberships:
    r = requests.get('http://api.openhluttaw.org/my/organizations/' + member['on_behalf_of_id'])
    if json.loads(r.content)['result']['name']:
        party = json.loads(r.content)['result']['name']
        amyotha_my.append({'consituency': member['post']['label'],
                    'party':party , 'gender':member['person']['gender'].lower() })
        
amyotha_my_df = pandas.DataFrame(amyotha_my)
amyotha_df_gender=amyotha_my_df.drop('consituency',axis=1)

gender_counts = amyotha_df_gender.groupby('party')['gender'].value_counts()
gender_counts


Out[10]:
party                                           gender
ဇိုမီးဒီမိုကရေစီအဖွဲ့ချုပ်ပါတီ                  female      1
                                                male        1
တသီးပုဂ္ဂလ                                      male        2
တအောင်း (ပလောင်)အမျိုးသားပါတီ                   male        2
တိုင်းရင်းသားစည်းလုံးညီညွတ်ရေးပါတီ              male        1
ပအိုဝ်းအမျိုးသားအဖွဲ့ချုပ်(PNO)ပါတီ             male        1
ပြည်ထောင်စုကြံ့ခိုင်ရေးနှင့်ဖွံ့ဖြိုးရေးပါတီ    male       10
                                                female      1
မွန်အမျိုးသားပါတီ                               male        1
ရခိုင်အမျိုးသားပါတီ                             male        9
                                                female      1
ရှမ်းတိုင်းရင်းသားများဒီမိုကရေစီအဖွဲ့ချုပ်ပါတီ  male        3
အမျိုးသားဒီမိုကရေစီအဖွဲ့ချုပ်ပါတီ               male      116
                                                female     20
dtype: int64

In [11]:
index_party = []
gender_values = []

for party in gender_counts.index:
    index_party.append(party[0])

index_party = list(set(index_party))

for party in index_party:
    male_count = gender_counts[party].male
    if 'female' in gender_counts[party].index:
        female_count = gender_counts[party].female
    else:
        female_count=0
    
    gender_values.append([male_count,female_count])
    
gender_df = pandas.DataFrame(gender_values, index=index_party, columns=['ကျား','မ'])

import matplotlib
%matplotlib inline
matplotlib.rc('font', family='Padauk') #Needed for proper rendering of characters
gender_df.plot.barh(stacked=True,figsize=(12,5))


Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f76355d60d0>

Reusable Tools

With at standards based API, some lower level tools as well as some applications can be reused by different implementing partners.

Sinar Project Popit Relationship Explorer

Visual explorer tool for Popit API/DB is a working proof of concept to allow users to interactively explore relationships between PEPs and organizations with live data from Popit API.

Source code: https://github.com/Sinar/popit_visualizer

Supported by